Other Schema Components

Occurrence Indicators

Occurrence indicators are used within element content models to specify how many times an element may appear at a given location. The indicators available to schema developers are listed below:

Occurrence Indicator	Meaning	XML Instance
none	The element must appear once and only once.	Neither optional nor repeatable.
?	The element (or group of elements) may appear zero or one times. The element is optional, but is only allowed to appear once.	Optional but not repeatable.
+	The element (or group of elements) must appear one or more times. The element is required to appear at least once, but multiple consecutive occurrences may be present.	Repeatable but not optional.
*	The element (or group of elements) may appear zero or more times. The element can appear as many times consecutively as needed, or even zero times.	Repeatable and optional.

In combination with sequence indicators (see below), these choices make it possible to describe complex structures. For example, a memo might allow multiple entries in its To: and From: fields, multiple (or zero) entries in its Cc: field, a single entry for the subject, required content for the body, and an optional set of initials at the bottom for the typist. A MEMO element might therefore have the following content model:

(To+, From+, Cc*, Subject, Body, Typist?)

This declaration requires the MEMO element to contain, in sequence, one or more To elements, one or more From elements, zero or more Cc elements, a single Subject element, a single Body element, and zero or one Typist elements.

Sequence Indicators

Sequence indicators are used within element content models to specify the order in which elements may appear . The three sequence indicators available to schema developers are listed below:

Sequence Indicator	Meaning	XML Instance
\|	Can be read as 'or', allowing the document to contain any of the elements or groups of elements listed.	Choices
,	Can be read as 'followed by,' requiring that the elements or groups of elements appear in the precise sequence indicated.	Sequence
( )	Groups elements, allowing a set of choices or a sequence to be used anywhere that a single element can appear.	( )

Some very simple documents can be expressed as a pure sequence. A date might be expressed as a Month, Day, and Year element, for example. This could be done using the comma sequence indicator:

(Month, Day, Year)

In other cases, a document needs to provide choices. A chapter might require an introduction, but then permit any combination of sections or sidebars. Sequence indicators, in combination with occurrence indicators, can make this possible. The content model for a chapter element might therefore look like:

(Intro, (Section | Sidebar)*)

The Intro element could appear once (and only once) at the start of the chapter, and then Section or Sidebar elements could follow in any order. (This model is read as "an Intro element followed by zero or more Section or Sidebar elements.")

Notations

Notations deal with content other than XML. Notation declarations are required for use with unparsed entities and may also be used to create new data types for use with elements and attributes. A notation is used to declare a particular class of data and associate it with an external program. Notations supply names for types and also identifiers (system, public, or both) that provide information on those types.

System identifiers are not required for notations, though at least one identifier (system or public) must appear. The syntax for a notation declaration in a typical DTD (Document Type Definition) might look like this:

<!NOTATION image_jpeg SYSTEM "http://www.isi.edu/in-notes/iana/assignments/media-types/image/jpeg">

When a DTD is being followed, an attribute's value may be of type NOTATION, as a way to map a reference to a notation declaration that exists elsewhere in the DTD.

<!ATTLIST IMAGE src NOTATION (image_jpeg)>

Within the XML document instance itself, the final element will look like this:

<IMAGE src="image_jpeg">

So, if attributes are used with notation values, they must bedeclared in the DTD correctly using both a notation declaration and a notation attribute declaration.

Entities

An XML document may consist of one or more physical storage units called entities. When a parser sees an entity reference in a document, it replaces the entity with the actual text (characters), graphic, sound clip, or other type of media being referred to. There are two kinds of XML entities: general entities and parameter entities. Both act as a kind of shortcut. A general entity is defined in a DTD but the entity's reference appears in the document instance, whereas a parameter entity can appear only in a DTD. Within each of these categories, there are four other types of entities:

Internal entities - refer to entities whose definitions can be found entirely within a document's DTD
External entities - refer to entities whose definitions can be found outside of a document
Parsed entities - entities that the XML processor can and will parse. A parsed entity can be referenced within an element.
Unparsed entities - entities that are not parsed by the XML processor; instead, they are handed off to another application for processing (e.g., a binary image file). An unparsed entity must appear as an attribute value.

Not only is every entity either parsed or unparsed, but every entity is also either internal or external. Therefore, you could have an internal parsed entity or an external unparsed entity.

Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516